On the Interaction Between True Source, Training, and Testing Language Models

نویسندگان

  • Douglas B. Paul
  • James K. Baker
  • Janet M. Baker
چکیده

An interaction has been found between the true source language model, training language model, and the testing language model. This interaction has implications for vocabulary independent modeling, testing methodologies, discriminative training, and the adequacy of our current databases for continuous speech recognition (CSR) development. The current DARPA databases suffer from the described difficulties which suggests that new CSR databases are needed if we are to further advance the state-of-the-art. The Interaction During Training When a category model (e.g. a context-free (CF) model such as a monophone) is used to a model a set of subcategories (e.g. context-dependent (CD) models such as triphones), the category model becomes the subcategory prior-probability weighted average of the subcategory models: Meat E PsubeatMsubcat where M denotes a model. (The mathematics used here are intended to be conceptual rather than rigorous. Thus models will be considered to be averages. In practice, the method for deriving a model from a set of sub-models or observations is highly dependent upon the form of model used.) In a field, such as speech recognition, where models are trained from exemplars, the subcategory model will generally be: N 1 Msttbcat = ~ ~.= Osubeat,i where 08=bcat,i is an observation emitted from the subcategory. Mcat combines both the subcategory models and the prior-probability of the subcategories and similarly Msubcat combines the observations and their (sampled) prior-probabilities. *This work was sponsored by the Defense Advanced Research Projects Agency. In speech recognition, a phone category would contain some set of subcategories and a subcategory would defined by some specific set of context factors. There are many factors which may be used to define the subcategories [3]; a commonly used set is triphone [18] subcategories and monophone categories. Alternatively, stressed and unstressed phones might be combined. (Note that this averaging is recursive: subcategories are the combination of some set of subsubcategories and so on...) We assume that speech is generated from some "true source" language model. (This language model would change as a function of many factors such as topic, history, and participants, but we will assume it to be constunt for each task.) This true language model is known for some artificial tasks such as the DARPA Resource Management (RM) database [16], but can be estimated for naturally elicited speech and text if sufficient data is available. (However, current techniques for estimating language models are fairly rudimentary.) Since the acoustic realization of the phones will be a function of this true language model, any acoustic models averaged over any group of subcategories will learn this true language model to some degree. (Learning the language model "to some degree" may be viewed as favoring the more likely subcategories.) Pragmatically, we have insufficient data to model all relevant subcategories separately and, even if we had sufficient data, we currently have insufficient computational resources to process all of it in any practical manner. Thus, since we must combine subcategories into larger models, a recognition system would favor the subcategories that were more commonly observed during training. Implications for Performance Testing Recognition is performed using some explicit language model. (No-grammar is a language model in which all following words are equally likely.) If the performance of a system is tested using a weaker language model than the true source language model, the acoustic models, if they have been affected by the training data language model, will strengthen the the total language model in the recognizer. Thus, one would expect better recogni-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing Several Rival Models Using the Extension of Vuong\'s Test and Quasi Clustering

The two main goals in model selection are firstly introducing an approach to test homogeneity of several rival models and secondly selecting a set of reasonable models or estimating the best rival model to the true one. In this paper we extend Vuong's method for several models to cluster them. Based on the working paper of Katayama $(2008)$, we propose an approach to test whether rival models h...

متن کامل

Long-term Utilization of Interaction by Young EFL Learners: The Effects of Strategy Training

The bulk of research within the interactionist framework seems to be consensually pointing to the beneficial effects of interaction in SLA. However, few studies have investigated the role of training in providing and perceiving interactional feedback, especially among young learners. This study probed the effects of training prior to engagement in interaction in case of young learners acquiring...

متن کامل

Testing Problems in Russian as a Foreign Language in a Technical University

 Problems of theory and practice of the Russian as a foreign language testing for entrants in technical universities are considered. The benefits of test forms for controlling the foreign students’ skills in the Russian language during a hard time limit are presented. The structure and content of the tests, all types of tasks offered on the entrance and final examinations in the Russian languag...

متن کامل

Academic Language Achievement: A Structural Equation Model of the Impact of Teacher-Student Interactions and Self-Regulated Learning

A correlational survey research design was utilized to investigate self-regulated Learning (SRL) and teacher-student interaction factors that had been realized to have contributive roles in EFL learners' academic success.  A sample of 218 EFL learners (male = 102 and female = 116) was drawn with the aid of a prior sample size calculator for the structural equation models from 645 students. They...

متن کامل

An Investigation of Spoken Output and Intervention Types among Iranian EFL Learners

This study was inspired by VanPatten and Uludag’s (2011) study on the transferability of training via processing instruction to output tasks and Mori’s (2002) work on the development of talk-in-interaction during a group task. An interview was devised as the pretest, posttest, and delayed posttest to compare four intervention types for teaching the simple past passive: traditional intervention ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990